Case Studies in Air Quality in Detroit Metro
Research Scientist, ISciences, LLC
Preface: A Note on Data Interpretation and Ethical Considerations
The topics of environmental justice, institutional racism, socioeconomic conditions, and pollution are complex and multifaceted. The data and analyses presented in this lesson are not intended to draw definitive conclusions or suggest scientific evidence of cause and effect relationships. Rather, they are meant to equip you with the skills to investigate data and perform analyses that could be applied to your local communities.
As you work through this lesson, remember that correlation does not imply causation. The patterns you may observe in the data could be influenced by factors not represented in these datasets. Approach your findings with caution, and consider the broader historical, social, and economic contexts that shape environmental and health outcomes in different communities.
This lesson will empower you with data literacy and analytical skills. We encourage you to use these tools and skills responsibly. Consider the ethical implications of your analyses and the potential impact on the communities represented in the data. When drawing insights or making decisions based on such analyses, it’s crucial to involve community stakeholders, consider multiple perspectives, and seek additional expertise when necessary.
# Define counties of interest
counties = ['Wayne', 'Oakland', 'Macomb']
# Fetch the county boundaries for Michigan from pygris dataset
metro_counties = pygris.counties(state="MI", year=2022)
# Filter the DataFrame to only include counties in the Detroit metro area
detroit_metro = metro_counties[metro_counties['NAME'].isin(counties)]
# Combine the geometries of selected counties into a single polygon
detroit_metro = detroit_metro.dissolve(by='STATEFP')
# Convert to GeoDataFrame and ensure coordinate reference system is EPSG:4269
detroit_metro = gpd.GeoDataFrame(detroit_metro, geometry='geometry', crs='EPSG:4269')
# Obtain the total bounding box for the Detroit metro area polygon
bbox = detroit_metro.total_boundspygris provides direct access to Census Bureau TIGER/Line shapefiles.
# Initialize list to hold TRI facility data for each county
tri_data = []
# Loop through each county to fetch TRI data separately
for county in counties:
# Construct the API URL for the current county
api_url = f"https://data.epa.gov/efservice/tri_facility/state_abbr/MI/county_name/{county}/JSON"
response = requests.get(api_url)
# Check if the response was successful
if response.status_code == 200:
county_data = response.json()
# Append the current county's TRI data to the overall list
tri_data.extend(county_data)
else:
print(f"Failed to fetch data for {county} County. Status code: {response.status_code}")
# Create a Pandas DataFrame from the collected TRI data
tri_df = pd.DataFrame(tri_data)
# Reproject TRI facility data to Web Mercator for map visualization
gdf_tri_form_r_bm = gdf_tri_form_r.to_crs(epsg=3857)
# Create scatter plot with graduated symbols based on air release amount
scatter = ax.scatter(gdf_tri_form_r_bm.geometry.x, gdf_tri_form_r_bm.geometry.y,
s=gdf_tri_form_r_bm['LOG_AIR_RELEASE']*20, # Size by log of release
c='orangered', # Color of symbols
edgecolor='yellow', # Edge color for visibility
alpha=0.7) # Transparencygeopandas enables visualization of facility air release amounts with graduated symbols.
from rasterio.enums import Resampling
# Select the 'Overall' layer from SVI dataset
svi_overall = svi_detroit.sel(layer='Overall')
svi_overall = svi_overall.rio.write_crs("EPSG:4326")
# Reproject SVI to match the TRI air release raster
svi_reprojected = svi_overall.rio.reproject_match(
air_release_raster_da)
# Transform and normalize the air release data
# 1. Log transform to reduce impact of outliers
air_release_log = np.log1p(air_release_disaggregated)
# 2. Scale log values to 0-1 range to match SVI
air_release_scaled = (air_release_log - air_release_log.min()) / \
(air_release_log.max() - air_release_log.min())
# Multiply scaled air release with SVI to create vulnerability index
# High values = areas with both high pollution and high vulnerability
vulnerability_indicator = air_release_scaled * svi_reprojected
# Find locations with highest combined vulnerability
vulnerability_df = vulnerability_indicator.to_dataframe(
name='index').reset_index()
top_10 = vulnerability_df.sort_values(
'index', ascending=False).head(10)rioxarray enables raster algebra between air pollution and SVI layers.
from rasterio.enums import Resampling
# Select the 'Overall' layer from SVI dataset
svi_overall = svi_detroit.sel(layer='Overall')
svi_overall = svi_overall.rio.write_crs("EPSG:4326")
# Reproject SVI to match the TRI air release raster
svi_reprojected = svi_overall.rio.reproject_match(
air_release_raster_da)
# Transform and normalize the air release data
# 1. Log transform to reduce impact of outliers
air_release_log = np.log1p(air_release_disaggregated)
# 2. Scale log values to 0-1 range to match SVI
air_release_scaled = (air_release_log - air_release_log.min()) / \
(air_release_log.max() - air_release_log.min())
# Multiply scaled air release with SVI to create vulnerability index
# High values = areas with both high pollution and high vulnerability
vulnerability_indicator = air_release_scaled * svi_reprojected
# Find locations with highest combined vulnerability
vulnerability_df = vulnerability_indicator.to_dataframe(
name='index').reset_index()
top_10 = vulnerability_df.sort_values(
'index', ascending=False).head(10)Identify the most vulnerable populations with values closest to 1
from scipy.interpolate import griddata
# Access CDC PLACES data via GeoJSON API
url = "https://data.cdc.gov/resource/cwsq-ngmh.geojson"
county_filter = " OR ".join([f"countyname = '{county}'"
for county in detroit_counties])
params = {
"$where": f"stateabbr = 'MI' AND ({county_filter})",
"$limit": 50000
}
response = requests.get(url, params=params)
gdf = gpd.read_file(response.text)
# Filter for asthma data
gdf_asthma = gdf[gdf['measure'] == 'Current asthma among adults'].copy()
gdf_asthma['data_value'] = pd.to_numeric(
gdf_asthma['data_value'], errors='coerce')
# Prepare for IDW interpolation
X = gdf_asthma.geometry.x.values
Y = gdf_asthma.geometry.y.values
Z = gdf_asthma['data_value'].values
mask = ~np.isnan(Z)
X, Y, Z = X[mask], Y[mask], Z[mask]
# Create interpolation grid
grid_resolution = 0.025
x_min, y_min, x_max, y_max = gdf_asthma.total_bounds
grid_x = np.arange(x_min, x_max, grid_resolution)
grid_y = np.arange(y_min, y_max, grid_resolution)
grid_xx, grid_yy = np.meshgrid(grid_x, grid_y)
# Interpolate asthma prevalence surface
grid_z = griddata(np.column_stack((X, Y)), Z,
(grid_xx, grid_yy), method='linear')scipy.interpolate creates continuous health surfaces from point data.
from pysal.explore import esda
from pysal.lib import weights
from splot.esda import moran_scatterplot
import shapely
# Create GeoDataFrame with aligned air release and asthma data
df = gpd.GeoDataFrame({
'air_release': air_release_aligned.values.flatten(),
'asthma': ds_clipped.asthma.values.flatten(),
'geometry': [
shapely.geometry.Point(x, y)
for x, y in zip(
np.repeat(air_release_aligned.x.values,
len(air_release_aligned.y)),
np.tile(air_release_aligned.y.values,
len(air_release_aligned.x))
)
]
})
df = df.dropna() # Remove rows with NaN values
# Create spatial weights matrix (k-nearest neighbors)
w = weights.distance.KNN.from_dataframe(df, k=8)
w.transform = 'r' # Row-standardize weights
# Calculate Bivariate Moran's I for air release vs asthma
moran_bv = esda.Moran_BV(df['air_release'], df['asthma'], w)
print(f"Bivariate Moran's I: {moran_bv.I}")
print(f"p-value: {moran_bv.p_sim}")
# Test shows significant spatial relationship between
# air releases and asthma prevalence (p < 0.001)pysal enables spatial autocorrelation analysis to assess pollution-health relationships.
from pysal.explore import esda
from pysal.lib import weights
from splot.esda import moran_scatterplot
import shapely
# Create GeoDataFrame with aligned air release and asthma data
df = gpd.GeoDataFrame({
'air_release': air_release_aligned.values.flatten(),
'asthma': ds_clipped.asthma.values.flatten(),
'geometry': [
shapely.geometry.Point(x, y)
for x, y in zip(
np.repeat(air_release_aligned.x.values,
len(air_release_aligned.y)),
np.tile(air_release_aligned.y.values,
len(air_release_aligned.x))
)
]
})
df = df.dropna() # Remove rows with NaN values
# Create spatial weights matrix (k-nearest neighbors)
w = weights.distance.KNN.from_dataframe(df, k=8)
w.transform = 'r' # Row-standardize weights
# Calculate Bivariate Moran's I for air release vs asthma
moran_bv = esda.Moran_BV(df['air_release'], df['asthma'], w)
print(f"Bivariate Moran's I: {moran_bv.I}")
print(f"p-value: {moran_bv.p_sim}")
# Test shows significant spatial relationship between
# air releases and asthma prevalence (p < 0.001)pysal enables spatial autocorrelation analysis to assess pollution-health relationships.
from pysal.explore import esda
from pysal.lib import weights
from splot.esda import moran_scatterplot
import shapely
# Create GeoDataFrame with aligned air release and asthma data
df = gpd.GeoDataFrame({
'air_release': air_release_aligned.values.flatten(),
'asthma': ds_clipped.asthma.values.flatten(),
'geometry': [
shapely.geometry.Point(x, y)
for x, y in zip(
np.repeat(air_release_aligned.x.values,
len(air_release_aligned.y)),
np.tile(air_release_aligned.y.values,
len(air_release_aligned.x))
)
]
})
df = df.dropna() # Remove rows with NaN values
# Create spatial weights matrix (k-nearest neighbors)
w = weights.distance.KNN.from_dataframe(df, k=8)
w.transform = 'r' # Row-standardize weights
# Calculate Bivariate Moran's I for air release vs asthma
moran_bv = esda.Moran_BV(df['air_release'], df['asthma'], w)
print(f"Bivariate Moran's I: {moran_bv.I}")
print(f"p-value: {moran_bv.p_sim}")
# Test shows significant spatial relationship between
# air releases and asthma prevalence (p < 0.001)pysal enables spatial autocorrelation analysis to assess pollution-health relationships.
Lessons detailing 2014 California drought, 2023 Canadian Wildfires, 2022 Central Plains Flash Drought
TOPSTSCHOOL curriculum materials available at: https://ciesin-geospatial.github.io/TOPSTSCHOOL/
American Association of Geographers 2025 Annual Meeting | Detroit, MI
Social Vulnerability Index
SEDAC’s SVI is a gridded version of the CDC Census Tract data